Bioinformatics of Brain Diseases
201
microarrays is the elevated cost for each experiment and the growing number of
probe designs that utilize low-specificity sequences [13]. These disadvantages
propelled researchers to come up with a sequence-based technique: RNA-seq.
Nevertheless, it is wise to use microarrays when there are a large number of
samples and cost is an issue or if you wish to directly compare the expression
profiles with data from another microarray platform.
8.2.2
RNA-seq Technologies
RNA sequencing (RNA-seq) is a technique that is being used to detect and
quantify mRNA molecules in a biological sample consisting of millions of cells
[14]. It uses high throughput sequencing to not only quantify gene expression
but also to determine alternatively spliced genes and detect allele specific
expression and more. RNA-seq may be applied to various types of RNA such
as mRNA, total RNA, microRNA, single cell RNA and long noncoding RNA
[15]. With RNA-seq, firstly the RNA is isolated and converted to cDNA.
Next, a sequencing library is prepared following a PCR amplification. The
cDNA is fragmented into short pieces, and finally sequencing is done using
an NGS (Next Generation Sequencing) platform (Figure 8.1B). Following the
production of sequence reads in FASTQ format, a reference sequence is then
used to align the reads [16].
There are several NGS platforms with Illumina (www.illumina.com) being
the most popular one.
Other major platforms can be listed as Roche 454
(www.424.cm), Pacific Biosciences (www.pacificbiosciences.com), Ion Torrent
(www.iontorrent.com), and SOLID (www.invitrogen.com). These platforms
differ in terms of sequencing and detection chemistry. Each NGS platform
has its own protocol. Selection of a platform may depend on the level of accu-
racy needed, the number and the length of the reads, whether RNA or DNA is
sequenced, amount of sample material, cost of the job, and the amount of time
needed to get the job done [17]. RNA-seq is an intricate, interwoven process
which involves steps such as PCR amplification, fragmentation, purification,
and sequencing. Any error in any of these stages could make the data unreli-
able. Which is why quality control (QC) is an important aspect of RNA-seq.
QC of RNAs is a critical step prior to library preparation. To obtain high
quality RNA, it is essential to stabilize the sample after collection, fully lyse
it, and eliminate any potential DNA contamination. Furthermore, RNA-seq
data of poor quality can dramatically bias the outcomes of analysis and result
in false conclusions. Additionally, biases such as GC-content (guanine-cytosine
content) and nucleotide composition and complexity of the transcriptome can
also cause flawed data [18]. Rigorous QC methods must be applied to the raw
data before any downstream analysis [19].
Unlike hybridization-based methods RNA-seq uses sequence-based ap-
proaches to determine the transcripts directly. Alternative splicing may be
detected if aligned to the genome. Furthermore, SNPs and paralogous genes
can be identified with this technology. The background noise is relatively